Selectional preference acquisition through sparse principal component analysis
نویسنده
چکیده
Words in an utterance are not placed in their respective slots randomly from a uniform distribution. In English, for example, a verb will rarely, if ever, follow a determiner. This is a syntactic restriction. From another perspective, one would not expect to find a word such as defenestration as the object of eat. This is what is known as the selectional preference of a word for another word in terms of its semantic domain. When this selectional preference is restricted to the arguments that a word may take on, it is called its subcategorization frame [11]. In this paper, I lay out a novel, minimally supervised approach to inducing subcategorization frames for verbs using sparse principal component analysis (SPCA)[18], a method which is not based on the traditional singular value decomposition but which utilizes regression techniques. Applying a state-ofthe-art parser for English[4] to a very large corpus[3], I extract all verb-direct object, verb-indirect object, and verb-subject pairings from a large corpus and build verb-by-argument matrices for each type of pairing. After converting the nonzero elements of the matrices to Z-scores and mean centering them, I perform SPCA on these matrices and transform the verbs into the argument space through the principal components. The similarity between the transformed verb and an argument are interpreted to denote the strength of the selectional preference. Therefore, the output is not a hard clustering but an ordered soft clustering that lists arguments for a given verb along with a numeric value for the preference. Previous un-/minimally supervised approaches to semantics in NLP have made use of a technique called latent semantic analysis (LSA). In the following sections, I first discuss LSA and the problems associated with it, then I provide a brief overview of previous approaches to selectional preference acquisition. Then I present sparse PCA, and how it is used in the current experiment. I conclude with an evaluation and discussion of the findings.
منابع مشابه
A New IRIS Segmentation Method Based on Sparse Representation
Iris recognition is one of the most reliable methods for identification. In general, itconsists of image acquisition, iris segmentation, feature extraction and matching. Among them, iris segmentation has an important role on the performance of any iris recognition system. Eyes nonlinear movement, occlusion, and specular reflection are main challenges for any iris segmentation method. In thi...
متن کاملA New IRIS Segmentation Method Based on Sparse Representation
Iris recognition is one of the most reliable methods for identification. In general, itconsists of image acquisition, iris segmentation, feature extraction and matching. Among them, iris segmentation has an important role on the performance of any iris recognition system. Eyes nonlinear movement, occlusion, and specular reflection are main challenges for any iris segmentation method. In thi...
متن کاملSparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains
In scientific and commercial fields associated with modern agriculture, the categorization of different rice types and determination of its quality is very important. Various image processing algorithms are applied in recent years to detect different agricultural products. The problem of rice classification and quality detection in this paper is presented based on model learning concepts includ...
متن کاملWord Sense Disambiguation For Acquisition Of Selectional Preferences
The selectional preferences of verbal predicates are an important component of lexical information useful for a number of NLP tasks including disambigliation of word senses. Approaches to selectional preference acquisition without word sense disambiguation are reported to be prone to errors arising from erroneous word senses. Large scale automatic semantic tagging of texts in sufficient quantit...
متن کاملPrincipal Component Analysis as a Dimensionality Reduction Technique and Sparse Representation Classifier as a Post Classifier for the Classification of Epilepsy Risk Levels from EEG Signals
The main aim of this paper is to perform the analysis of Principal Component Analysis (PCA) as a Dimensionality Reduction technique and Sparse Representation Classifier (SRC) as a Post Classifier for the Classification of Epilepsy Risk levels from Electroencephalography signals. The data acquisition of the EEG signals is performed initially. Then PCA is applied here as a dimensionality reductio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008